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Abstract 

We consider a set of agents who are attempting to 
iteratively learn the 'state of the world' from their 
neighbors in a social network. Each agent initially 
receives a noisy observation of the true state of the 
world. The agents then repeatedly 'vote' and ob- 
serve the votes of some of their peers, from which 
they gain more information. The agents' calculations 
are Bayesian and aim to myopically maximize the ex- 
pected utility at each iteration. 

This model, introduced by Gale and Kariv (2003), 
is a natural approach to learning on networks. How- 
ever, it has been criticized, chiefly because the agents' 
decision rule appears to become computationally in- 
tractable as the number of iterations advances. For 
instance, a dynamic programming approach (part of 
this work) has running time that is exponentially 
large in min(7i, {d — 1)*), where n is the number of 
agents. 

We provide a new algorithm to perform the agents' 
computations on locally tree-like graphs. Our algo- 
rithm uses the dynamic cavity method to drastically 
reduce computational effort. Let d be the maximum 
degree and t be the iteration number. The computa- 
tional effort needed per agent is exponential only in 
0{td) (note that the number of possible information 
sets of a neighbor at time t is itself exponential in td). 

Under appropriate assumptions on the rate of con- 
vergence, we deduce that each agent is only required 
to spend polylogarithmic (in 1/e) computational ef- 
fort to approximately learn the true state of the world 
with error probability e, on regular trees of degree at 
least five. We provide numerical and other evidence 
to justify our assumption on convergence rate. 

We extend our results in various directions, includ- 
ing loopy graphs. Our results indicate efficiency of 
iterative Bayesian social learning in a wide range of 
situations, contrary to widely held beliefs. 
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1 Introduction 

Consider a group of Facebook users who are each 
faced with the dilemma of whether to place an or- 
der for the new iGadget or the new Gagdetoid. Each 
boldly ventures to the wild and does independent re- 
search on the subject matter, discovering the "cor- 
rect" answer with some probability p > 1/2. Then, 
over the next few weeks, before making the final de- 
cision, they daily share their current opinion on the 
matter with their Facebook contacts by posting ei- 
ther iGadget or Gadgetoid on their status line. Ev- 
ery day, after learning their friends' opinions, they 
update their own by performing the Bayesian calcu- 
lation that determines which of the two options is 
more likely to be true, given all they know. Eventu- 
ally, they make a purchase based on this information. 
Such dynamics have become an integral part of elec- 
tronic commerce, and understanding them is valuable 
to social media advertisers and vendors. 

This model (or rather, a slightly more general ver- 
sion of it) was introduced by Gale and Kariv [?]• It 
is one in a long succession of social learning mod- 
els. Already in 1785 Condorcet [4] considered how a 
group of individuals with weak private signals could 
reach a correct collective decision; he showed that a 
majority vote is likely to be correct when the group 
is large enough. Models such as those of Banerjee [5], 
Bikhchandani, Hirshleifer and Welch [3j and Smith 
and Sorensen [11] allow for each individual to make 
a single decision, learning from the decisions of her 
predecessors. The models of DeGroot [S] and Bala 
and Goyal [T] consider social networks and repeated 
interactions between agents. 

The model of Gale and Kariv combines features 
from all of the above. It describes a group of individu- 
als (or agents) , each with a private signal that carries 
information on an unknown state of the world. The 
individuals form a social network, so that each ob- 
serves the actions of some subset - her neighbors. The 
agents must choose between a set of possible actions, 
the relative merit of which depends on the state of 
the world. The agents iteratively learn by observing 
their neighbors' actions, and picking an action that 
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is myopically optimal, given the information known 
to them. 

Even in the simple case of two states of the world, 
two possible private signals and two actions, the re- 
quired calculations appear to be very complicated. 
This has indeed been a recurring criticism of this 
model (see, e.g., [HI [H])- One approach to this dif- 
ficulty is the bounded rationality approach of Bala 
and Goyal where agents ignore part of the in- 
formation available to them and perform a Bayesian 
calculation on the rest. 

While the bounded rationality approach has led to 
impressive results, it has two disadvantages, as com- 
pared with a fully Bayesian one: first, it is bound 
to involve a somewhat arbitrary decision of which 
heuristics the agents use. Second, a game theoretic 
analysis of strategic players is possible only if the 
players choose actions that are optimal by some cri- 
terion. Hence game-theoretic analyses of learning on 
networks (e.g. [T2]) often opt for the more difficult 
but fully Bayesian model. 

A different approach to the difficulty of computa- 
tion in the Bayesian model is to show that the calcu- 
lations are in fact not as difficult as they appear, at 
least in some cases. In this paper we show that when 
the graph of social ties is locally a tree, or close to 
one, then the computational outlook is not a bleak as 
previously thought. 

We first give a simple dynamic programming algo- 
rithm for the Gale and Kariv model that is exponen- 
tial in the number of individuals. Since at iteration 
t one may consider only agents at distance t, then 
in graphs of maximum degree d (on which we focus) 
the number of individuals to consider is 0{{d — 1)*), 
and the time required of each individual to compute 
their action (or vote) at time t is 2'-''(''^^) K We then 
develop a sophisticated dynamic program for locally 
tree-like graphs that reduces the computational effort 
to 20(*rf). 

We conjecture, and show supporting numerical evi- 
dence, that on infinite trees of degree at least five, the 
number of iterations needed to calculate the correct 
answer with probability 1 — e is 0(loglog(l/e)). In 
fact, we rigorously establish this for the 'majority dy- 
namics' update rule, in which agents adopt the opin- 
ion of their neighbors in the previous round. Thus, 
our conjecture follows if iterative Bayesian learning 
learns at least as fast as majority, as suggested by 
intuition and numerical evidence, which we present. 
Assuming this conjecture, the computational effort 
required drops from quasi-polynomial in 1/e (using 
the naive dynamic program) to polylogarithmic in 

(lA)- 

An additional difficulty of the Gale and Kariv 



model is that it requires the individuals to exactly 
know the structure of the graph. A possible solu- 
tion to this is a modification that allows the agents 
to know only their own neighborhoods and the distri- 
bution from which the rest of the graph was picked. 
We pursue this for the natural configuration model of 
random graphs (see below for full explanation) and 
show that the same computational upper bounds ap- 
ply here. 

We also introduce two further features into the 
model and show how to deal with them algorithmi- 
cally. First, there may be a finite number of 'hub' 
nodes who are each observed by many nodes leading 
to several short loops in the connectivity graph. We 
show that our algorithm can be suitably modified for 
this case. Second, we consider that nodes may not all 
be 'active' in each round, and that nodes may observe 
only a random subset of active neighbors. We show 
that this can be handled when 'inactive' edges/nodes 
occur independently of each other and in time. 

The key technique used in this paper is the dy- 
namic cavity method, introduced by Kanoria and 
Montanari 110| in their study of "recursive majority" 
updates on trees, which was also motivated by social 
learning. A dynamical version of the cavity method of 
Statistical Physics, this technique was used to analyze 
majority dynamics on trees, and appears promising 
for the analysis of iterative tree processes in general. 
In this work, we use this technique for the first time to 
give an algorithm for efficient computation by nodes. 
This is in contrast to the case of majority updates, 
where the update rule is computationally trivial. Our 
algorithmic approach leveraging the dynamic cavity 
method may be applicable to a range of iterative up- 
date situations on locally treelike graphs. 

2 Model 

• There is a true state of the world s € S, where S 
is finite. The prior distribution P [s] is common 
knowledge. 

• Let G = {V,E) be an undirected connected 
graph of agents and their social ties. Let n = \ V\. 

• Denote by di the neighbors of agent i, not in- 
cluding i. 

• Each agent i receives a private signal Xi € X, 
where X is finite. Private signals are indepen- 
dent conditioned on s. The distribution P[xi|s] 
is common knowledge. We assume that the sig- 
nal is informative, so that P [xj|s] is different for 
different values of s. 
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• We identify the set of actions available to agents 
with the set S of the states of the world (thus 
we call the actions 'votes'). For each state of the 
world s, action <t has utility one when the state of 
the world is s = cr, and zero otherwise. Thus the 
action that maximizes the expected utility corre- 
sponds to the maximum a posteriori probability 
(MAP) estimator of the state of the world. 

• At each time period t G {0, 1,2,.. .} each agent 
takes an action and then observes the actions 
take by her neighbors. 

• Denote by the information available to agent 
i at time t. We do not include in this her neigh- 
bors' votes at time t. 

• At each time period, the agents' goal is to maxi- 
mize their expected utility. They are myopic and 
fully Baycsian, and so at time t agent i takes ac- 
tion arg maXgg5 P [s| J-"/], using some tie breaking 
rule if necessary. This tie breaking rule is also 
common knowledge. 

• a-i{t) denotes agent i's action at time t. cr* = 
{ai{t')\t' < t} denotes all of agent z's actions, 
up to and including time t. Then J^f includes 
Xi, {<y*~^\j G di} and al~^ (which is actually a 
function of the first two in case of a deterministic 
tie breaking rule, see below). 

• We refer to ai as z's trajectory. 

• Denote (Jg^ = {crjlj G di}. We assume a deter- 
ministic tie-breaking rule, so that cr* is a deter- 
ministic function of Xj and cr^T^. To differentiate 
the random variable a* from the function used to 
calculate it, we denote the function by gj: 

For convenience, we also define the scalar func- 
tions gi,tixi,a*^^) corresponding to (Ti{t), so that 
9i = {9ifi,gi,i,---,9i,t)- 

3 A Simple Algorithm 

A sign of the complexity of this Bayesian calculation 
is that even the brute-force solution for it is not triv- 
ial. We therefore describe it here. 

One way of thinking of the agents' calculation is to 
imagine that they keep a long list of all the possible 
combinations of initial signals of all the other agents, 
and at each iteration cross out entries that are incon- 
sistent with the signals that they've observed from 
their neighbors up to that point. Then, they calcu- 
late the probabilities of the different possible states 



of the world by summing over the entries that have 

yet to be crossed out. 

This may not be as simple as it seems. To under- 
stand which initial configurations are ruled out by a 
signal coming from a neighbor, an agent must "sim- 
ulate" that neighbor's behavior, and so each agent 
must calculate the function gl for every other agent 
i and every possible set of observations by i. We for- 
malize this below. 

Let X G Af" be the vector of private signals {xi)i^v 
The trajectory of «, cr^, is a deterministic function of 
X. Assume then that up to time t — 1 each agent 
has calculated the trajectory cr*~^(x) for all possible 
private signal vectors x and all agents i. This is trivial 
for t - 1 = 0. 

We say that y is feasible for i at time t ii Xi = 
yi and cr^j = crg,-(j/). We denote this set of feasible 
private signal vectors Il{xi,aQ^). To calculate cr|(a;), 
one need only note that 

P[s|j|] cxP[,s]P[.T„a*,-i|s] 

= P[s] v[y\s] 

and 

9iA^i-,cF*Q^) = arg max P [s|J"*] 

by definition. We use the standard abusive nota- 
tion P [xi] instead of P [xi = yi], P [crj] instead of 
P[a*=^],etc. 

It is easy to verify that using this the calculation of 
each cr*(a;) takes 0{tn\X\'^). One can do better than 
perform each of these separately, but in any case the 
result is exponential in n, so we derive a rough upper 
bound of 2'^'^"^ for this method. Since we are in par- 
ticular interested in graphs of maximum degree d, we 
note that up to time t an agent need only perform this 
for agents at distance at most t, and so this bound 
becomes 2'^^^'^~^^ ' for large graphs, i.e., graphs for 
which n> {d— 1)* for relevant values of t. 

4 The Dynamic Cavity Algo- 
rithm on Trees 

Assume in this section that the graph G is a tree 
with finite degree nodes. For j G di let Gj^i = 
{Vj^i,Ej^i) denote j's connected component in the 
graph G with the edge {i,j) removed. That is, Vj^i 
is j's subtree when G is rooted in i. 
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4.1 The Dynamic Cavity Method Given that, the decision function is as before 



We consider a modified process where agent i is re- 
placed by a zombie which takes fixed actions t,; = 
(Ti(0), ri(l), . . .), and the true state of the world is 
assume to be some fixed s. Furthermore, this 'fix- 
ing' goes unnoticed by the agents (except i, who is a 
zombie anyway) who carry on their calculations, as- 
suming i is her regular Bayesian self, and the state of 
the world is drawn randomly according to P [s] . We 
denote by Q [^| jr^, s] the probability of event A in this 
modified process. This modified process is easier to 
analyze, as the processes on each of the subtrees Vj^i 
are independent. This is formalized in the following 
claim, without proof: 



Claim 4.1. 



(1) 



(Since tr* 



is unaffected by Ti{t') for all t' > t, we 
only need to specify r*, and not the entire r,.) 

Now, it might so happen that for some number of 
steps the zombie behaves exactly as may be expected 
of a rational player. More precisely, given cr|7^' 



may be the case that t- 



9t [Xi,cra, 



This event 



provides the connection between the modified process 
and the original process, and is the inspiration for the 
following theorem. 

Theorem 4.2. For all i, t and Ti 



[4Ti|s,a;.]l(T* = 



[4-i||T„s]l(r*=5*(a;.,4Ti)) 



(2) 



arg max 1 

ses 



(4) 



As mentioned before, we assume there is a determin- 
istic tie breaking rule that is common knowledge. 

We are finally left with the task of calculating 
Q [-ll-]. The following theorem is the heart of the dy- 
namic cavity method and allows us to perform this 
calculation: 

Theorem 4.3. For j e di and t G N 



E 



E 



x,\sn[^]-9]{^jArl-\^l-{d 



where the neighbors of node j are dj = {i, 1, 2, . . . , d— 
!}■ 

We mention without proof that the recursion eas- 
ily generalizes to the case of a random tie-breaking 
rule, provided the rule is common knowledge; it is a 
matter of replacing the expression 1 [cr* = ' ' ' ] with 
P [cr* = ■••], where this probability is over the ran- 
domness of the rule. Eq. ([3| continues to be valid in 
this case. 

The following proof is similar to the proof of 
Lemma 2.1 in [10] . where the dynamic cavity method 
is introduced and applied to a different process. 



Proof. We couple the original process, after choosing 
s, to the modified processes by setting the private 
signals to be identical in both. 

Now, clearly if it so happens that r* = g* [xi, cr*^^) 
then the two processes will be identical up to time 
t. Hence the probabilities of events measurable 
up to time t will be identical when multiplied by 
1 (t* = g* (a;*, crg7^))' theorem follows. □ 

Using Eqs. ([T]) and we can easily write the 
posterior on s computed by node i at time t, in terms 
of the probabilities Q [-H-]: 



[s\:F!]^F[s]¥[x,,al-'\s] 



\(TL^\s,Xi] 



= V[s]V[xMU^[^^''M~'^^] 

jedi 



(3) 



Proof. In the modified process the events in the dif- 
ferent branches that i sees are independent. We 
therefore consider Vj^i only, and view it as a tree 
rooted at j. Also, for convenience we define aj = r*; 
note that the random variable ct* does not exist in 
the modified process, as i's trajectory is fixed to r^. 

Let X be the vector of private signals of j and all 
the vertices up to a distance t from j (call this set 
of vertices Vj^^). For each I S — 1}, let 

X; be the vector of private signals of V,*T^. Thus, 



■El 1 



^-d-1) 



The trajectory cr* is a function - deterministic, by 



of X and r* 



We shall denote this 



our assumption 

function by Fj^i and write cr* = Fj^^{x,Tf). This 
function is uniquely determined by the update rules 

gli^w^k') for/el/^.- 

We have therefore 



(Note that cr*-i 
{x^,al~^).) 



is a deterministic function of 



[x|s]l(A* 



3- 



M,r!)). (6) 
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We now analyze each of the terms appearing in this 
sum. Since the initialization is i.i.d., we have 

V[x\s]=V[x,\s]VkMVk2\s]---V[^d-M ■ (7) 

The function Fj^^{- ■ ■ ) can be decomposed as fol- 
lows: 



d-l 

1=1 



t-i _ pt-i 



fe,A*-i)). (8) 



Using Eqs. ([7| and ([s]) in Eq. ^ and separating terms 
that depend only on x^, we get 



The recursion follows immediately by identifying 
that the product over I in fact has argument 

4.2 The Agents' Calculations 

We now have in place all we need to perform the 
agent's calculations. At time t = Q these calculations 
are trivial. Assume then that up to time t each agent 
has calculated the following quantities: 

1. Q [c^r^lki"^'*]' ^ hj ^ V 

such that j e di, and for all r*~^ and cr*~^. 



2. gl{x„ CTg- ^) for aU i, Xi and o-g- ^ 

Note that these can be calculated without making any 
observations - only knowledge of the graph is needed. 

At time i + 1 each agent makes the following cal- 
culations: 

1. Q [o-*-| [tj*, s] for all s,i,j,apTl. These can be 
calculated using Eq. given the quantities 
from the previous iteration. 

2. gl^^{xi,ag^) for all i, Xi and dgj. These can be 
calculated using Eqs. pl and Q and the the 
newly calculated Q [ct* |p*, s] . 

Since agent j calculates 5*^^ for all i, then she in 
particular calculates <?j^^- Therefore, she can use this 
to calculate her next action, once she observes her 
neighbors' actions. A simple calculation yields the 
following lemma. 



Lemma 4.4. In a tree graph G with maximum degree 
d, the agents can calculate their actions up to time t 
with computational effort n2'^*^*''' . 

In fact, each agent does not need to perform cal- 
culations for the entire graph. It suffices for node i 
to calculate quantities up to time t' for nodes at dis- 
tance t — t' from node i (there are at most {d — 
such nodes). A short calculation yields an improved 
bound on computational effort. 

Theorem 4.5. In a tree graph G with maximum de- 
gree d, each agent can calculate her action up to time 
t with computational effort 2'^*^*''' . 

4.3 Dynamic Cavity Algorithm: Ex- 
tensions 

Our algorithm admits several extensions that we ex- 
plore in this section: Section 4.3.1| discusses graphs 
with loops and 'hubs', Section[4.3.2 discusses random 



graphs. Section [4.3. 3| relaxes the assumption that the 
entire graph is common knowledge and Section [4. 3. 4| 
allows nodes/edges to be inactive in some rounds. 

First we mention some straightforward generaliza- 
tions: 

It is easy to see that dynamic cavity recursion 



(Theorem 4.3 1 does not depend on any special prop- 
erties of the Bayesian update rule. The update rule 
5i,t(') can be arbitrary. Thus, if agent i wants to per- 
form a Bayesian update, he can do so (exactly) using 
our approach even if his neighbor, agent j, is using 
some other update rule. 

Remark 4.6. The dynamic cavity recursion can be 
used to enable computations of agents even if some 
of them are using arbitrary update rules (provided the 
rules are 'well specified' and common knowledge). 

For instance, our approach should be applicable in 
'partial Bayesian' settings. 

Our algorithm is easily modified for the case of a 
general finite action set A that need not be the same 
as S, associated with a payoff function u : x 5 — > M. 
Moreover the action set and payoff function can each 
be player dependent (Ai, Ui respectively). 

We already mentioned that there is a simple gen- 
eralization to the case of random tie breaking rules 
(that are common knowledge). 

Instead of having only undirected edges (corre- 
sponding to bidirectional observations), we can al- 
low a subset of the edges of the tree to be directed. 
In this case, the same algorithm works with suitably 
defined neighborhood sets di. In other words, our 
result holds for the class of directed graphs lacking 
cycles of length greater than two (which correspond 
to undirected edges). 
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4.3.1 Loops and Hub nodes 

A class of graphs that arc not trees, but for which 
this dynamic cavity method can be easily extended 
is that of trees with 'hub' nodes in addition. 

Consider then a graph that is not a tree, but can 
be transformed into a tree by the removal of some 
small set of nodes Vjoop C V. Then the same cal- 
culations above can still be performed, with a time 
penalty of |A'|l^'°°pl; the calculation in Eq. ([s]) is sim- 
ply repeated for each possible set of private signals 
of the hub nodes, and the probabilities in Eq. ([s]) are 
arrived at by averaging the lA-jl^i-pl different possi- 
ble cases. In fact, one may not even need to average 
over all nodes in Vioop, since at iteration t only those 
inside Bf (the ball of radius t around i) effect the 
outcome of i's calculations. Hence the complexity 
of calculations up to iteration t is now |A'|"'2'-'^*''', 
where rit = max(|B* n MoopDiey- 

If we also allow directed edges in this model, then 
we can extend it to include nodes of unlimited in- 
degree, i.e., some nodes may be observed by a un- 
bounded number of others. These are agents who 
are observed by any number (perhaps an infinity) of 
other agents, in the spirit of Bala and Goyal's "royal 
family" [T]. We call such nodes 'hubs' for obvious 
reasons. For instance, a popular blogger or a news- 
paper might constitute such a hub. Here too the same 
computational guarantees holds. 

4.3.2 Random graphs 

Consider a random graph on n nodes drawn from the 
configuration model with a given degree distributior{^ 
It is well known that such graphs are locally tree-like 
with high probability (see, e.g. [^). More formally, 
for any t < oo we have 



hm P [B* is a tree.] = 1 , 



(9) 



Since node calculations up to time t depend only on 
Bf, it follows that with high probability (w.h.p.), for 
an arbitrarily selected node, the tree calculations suf- 
fice for any constant number of iterations]^ As we 
show in Section [5] just O(loglogl/e) iterations (a 
small number independent of n) are enough to learn 
the true state of the world with probability at least 



^In the configuration model, one first assigns a degree to 
each node, draws the appropriate number of 'half-edges' and 
then chooses a uniformly random pairing between them. One 
can further specify that a graph constructed thus is 'rejected' 
if it contains double edges or self-loops; this does not change 
any of the basic properties, e.g., the local description, of the 
ensemble. 

^In fact, as mentioned earlier, nodes with a small number 
of loops in the vicinity can also do their calculations without 
trouble. 



1 — £ for any e > 0, provided private signals are not 
too noisy. Thus, our computational approach works 
for random graphs w.h.p. 

4.3.3 Learning without Knowledge of the 
Graph 

Here we consider the situation where nodes do not 
know the actual graph G, but know some distribu- 
tion over possibilities for G. This is potentially a 
more realistic model. Moreover, the assumption that 
agents are assumed to know the entire graph struc- 
ture is considered a weakness of the model of Gale 
and Kariv. We address this issue here, showing that 
our algorithm can be modified to allow Bayesian es- 
timation in this case as well. 

Let G = Gn be a random graph of n nodes con- 
structed according to the configuration model for a 
given (node perspective degree) distribution. Denote 
the degree distribution by pv, so that pv{d) = prob- 
ability that a randomly selected node has degree d. 

Now, in this ensemble, the local neighborhood up 
to distance D of an arbitrary node v with fixed degree 
dy converges in distribution as n cx) to the follow- 
ing: Each of the neighbors of node v has a degree 
drawn independently according to the 'edge perspec- 
tive' degree distribution pE, defined by: 



PE{d) 



dpv{d) 



T.d'm'^'Pvid') 



Further, each of the neighbors of the neighbors (ex- 
cept V itself) again have a degree drawn indepen- 
dently according to psid), and so on up to depth 
D. Call the resulting distribution over trees Tj^. 

Now suppose that agents are, in fact, connected 
in a graph drawn from the ensemble Gn with degree 
distribution pv- Suppose that each node u knows 
the distribution pv and its own degree du, but does 
not know anything else about G„|^ Further, suppose 
that this is common knowledge. Now in the limit 
n — > cx), an exact Bayesian calculation for a node 
V up to time t depends on py via . Since nodes 
know only their own degree, there are only A different 
'types' of nodes, where A is the size of the support 
of pE{d). There is one type for each degree. This 
actually makes computations slightly simpler than in 
an arbitrary known graph. 

Fix state s. Take an arbitrary node i. Make it a 
'zombie' following the vote trajectory t^. Now fix 
some di (ensure py(|9j|) > 0). Choose arbitrary 
j g di. Define Q [a* = w*||r*,s] as the probability 



^Other 'knowledge' assumptions can be similarly handled, 
for instance where a node knows its own degree, the degree of 
its neighbors and pv ■ 
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of seeing trajectory cr' — utj at node j in this set- 
ting. This probabihty is over the graph reahzation 
(given di) and over the private signals. Note here 
that Q [cr* = w* I |r*, s] is the same for any i, di and 
j e di. 

Eqs. Q, ^ and Q continue to hold w.h.pj^ for 
the same reasons as before. 

The dynamic cavity recursion, earlier given by 
Eq. ([5|, becomes 



n 



1 

d-1 



(10) 



We have written the recursion assuming the neigh- 
bors of i are named according to dj\i = {1,2,..., d— 
1}. Again, this holds w.h.p. with respect to n. 

We comment that there is a straightforward gen- 
eralization to the case of a multi-type configuration 
model with a finite number of types. Nodes may or 
may not be aware of the type of each of their neigh- 
bors (both cases can be handled). For instance, here 
is a simple example with two types: There are 'red' 
agents and 'blue' agents, and each 'red' agent is con- 
nected to 3 'blue' agents, whereas each 'blue' agent 
is connected to either 5 or 6 'red' agents with equal 
likelihood. In this case the degree distribution itself 
ensures that nodes know the type of their neighbors 
as being the opposite of their own type. Multi-type 
configuration models are of interest since they allow 
for a rich variety 'social connection' patterns. 

4.3.4 Observing random subsets of neighbors 

We may not interact with each of our friends every 
day. Suppose that for each edge e, there is a prob- 
ability Pe that the edge will be 'active' in any par- 
ticular iteration, independent of everything else. Let 
ae{t) G {*,a}, be an indicator variable for whether 
edge e was active at time t (a denotes 'active'). Now, 
the observation by node i of node j belongs to an ex- 
tended set that includes an additional symbol * corre- 
sponding to the edge being inactive. Thus, there are 
{\S\ + 1)*+^ possible observed trajectories up to time 
t. Our algorithm can be easily adapted for this case. 
The modified 'zombie' process involves fixing state 
of the world s, trajectory and also {aij{t))j^Qi for 
all times t. The form of posterior on the state of the 
world, Eq. ([s]), remains unchanged. The cavity recur- 
sion Eq. ^ now includes a summation over the pos- 

*We need the ball of radius t around j to be a tree. 



sibilities for (a* ^, ■ ■ ■ , a^-i)- overall complexity 
remains 2^^*''^\ 

The case where node v becomes inactive with some 
probability p^ in an iteration, independent of every- 
thing else, can also be handled similarly. A suit- 
able formulation can also be obtained when both the 
above situations are combined, so that both nodes 
and edges may be inactive in an iteration. 

5 Rapid learning on trees 

We say that there is doubly exponential convergence 
to the state of the world s if the error probability 
P [ui (t) 7^ s] decays with round number t as 



-log(P[a,(i)^s])er!(6*) 



(11) 



where & > 1 is some constant. 

The following is an immediate corollary of Theorem 



Corollary 5.1. Consider iterative Bayesian learn- 
ing on a tree of with maximum degree d. If we have 
doubly exponential convergence to s, then computa- 
tional effort that is polylogarithmic in (1/e) suffices 
to achieve error probability P [(Ti(t) ^ s] < e. 

We are handicapped by the fact that very little 
in known rigorously about convergence of iterative 
Bayesian learning. Nevertheless we provide the fol- 
lowing evidence for doubly exponential convergence 
on trees: 

In Section 5.1 we study a simple case with two 



possible states of the world and two possible private 
signal values on a regular directed tree. We show 
that except for the case of very noisy signals, we have 
doubly exponential convergence if the degree is at 
least five. 

Next, in Section [5]2] we state a conjecture and show 
that it implies doubly exponential convergence of iter- 
ative Bayesian learning also on undirected trees. We 
provide numerical evidence in support of our conjec- 
ture. 

5.1 Directed trees 

Consider an infinite directed d-ary tree. By this we 
mean a tree graph where each node i has one 'parent' 
who observes i and d 'children' whom i observes, but 
who do not observe i. Learning in such a tree is much 
easier to analyze (than an undirected tree) because 
the trajectories of the d children are uncorrelated, 
given s. 

We assume a binary state of the world s and inde- 
pendent binary signals that are each incorrect with 
probability 6. 
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Lemma 5.2. On an infinite directed d-ary tree, the 
error probability (at any node) at time t is bounded 
above by 5t, where Sq = S and we have a recursive 
definition 



'[Binoniial(d,(5f_i) > d/2] . 



(12) 



Proof. We proceed by induction on time t. Clearly, 
the error probability is bounded above by Sq at t = 
0. Suppose, F[ai{t) ^ s] < St. Consider a node j 
making a decision at time t + 1. Let the children of j 
be 1, 2, . . . , d. Define CTj, the opinion of the majority 
of the children, by 



aj{t + 1) = sgn 



where sgn(O) is arbitrarily assigned the value —1 or 
+1. The 'error-or-not' variables [o'i(t) ^ s] are iid, 
with V[(Ji{t) 7^ s] < St by the induction hypothesis. 
Hence, 



'[a,{t + l)^s] < 



' [Binomial(d, St) > d/2] = 6t+i . 

(13) 



Since the agent j is Bayesian, she in fact uses the 
information {xj, cr*, . . . , cr^) to compute a MAP esti- 
mate (Tj{t+ 1) of the true state of the world. Clearly, 
P [aj{t + 1)^3]<V [aj{t + 1)t^s]. Using Eq. ([iS]), 
it follows that P [a-j{t + 1) ^ s] < St+i. Induction 
completes the proof. □ 



It follows (by an argument similar to the one used 
in the proof of theorem 5.4 below) that we have dou- 
bly exponential convergence to the true state of the 
world, if the noise level is not too high. We obtain 



-\og¥[<j,{t)^s] en{{d/2Y 



(14) 



implying that O(loglog(l/e)) rounds suffice to reduce 
the error probability to below e. 

5.2 Bayesian vs. 'majority' updates 

We conjecture that iterative Bayesian learning leads 
to lower error probabilities (in the weak sense) than 
a very simple alternative update rule we call "major- 
ity dynamics" . Under this rule the agents adopt the 
action taken by the majority of their neighbors in the 
previous iteration (this is made precise in Definition 
|A.1[ ). Our conjecture is natural since the iterative 
Bayesian update rule chooses the vote in each round 
that (myopically) minimizes the error probability. 

Conjecture 5.3. On any regular tree with indepen- 
dent identically distributed private .signals, the error 



Round 


Bayesian 


Majority 





0.15 


0.15 


1 


2.7- 10"^ 


2.7- 10"^ 


2 


7.6- 10~* 


1.7- 10"^ 


3 


2.8 ■ lO"'^ 


8.4 • lO"*^ 


4 


1.4- 10"^^ 


2.5 ■ 10"" 



Table 1: Error probability on regular tree with d = 5 
and P [xi 7^ s] = 0.15, for (i) Bayesian and (ii) majority 
updates. The agents break ties by picking their original 
private signals. 



probability under iterative Bayesian learning is no 
larger than the error probability under majority dy- 
namics (cf. Definition A.l) after the same number 
of iterations. 



We use di{t) to denote votes under the majority 
dynamics. 

In Appendix |Aj we show doubly exponential con- 
vergence for majority dynamics on regular trees: 

Theorem 5.4. Assume binary s with uniform prior. 
Agents' initial votes CTi(O) are correct with probability 
\ — S, and independent conditioned on s. Let i be 
any node in an (undirected) d regular tree for d > 5. 
Then, under the majority dynamics, 



(15) 



-logP[a,(i)^s]el7(^(i(d-2)) 

when S < (2e(d- l)/(d- 2))"^. 
Thus, if Conjecture |5.3| holds: 



We also have doubly exponential convergence for 
iterative Bayesian learning on regular trees with 
d > 5, implying that for any e > 0, an error 
probability e can be achieved in 0(loglog(l/e)) 
iterations under iterative Bayesian learning. 



Combining with Theorem 4.5 (cf. Corollary 



5.1), we see that the computational effort that 
is polylogarithmic in (1/e) suffices to achieve er- 
ror probability 1/e. 

This compares favorably with the quasi-poly(l/e) 
bound on computational effort that we can derive by 
combining Conjecture |5.3| and the simple dynamic 
program described in Section [3j 

In table[l]we provide numerical evidence on regular 
undirected trees in support of our conjecture. Further 
numerical results are presented in Appendix [B] All 
computations are exact, and were performed using 
the dynamic cavity equations. The results are all 
consistent with our conjecture over different values 
of d and P[xi ^ s]. 
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I 1 , 1 1 — — H — e — < 1 < ~ 

1 2 3 4 5 6 
t 



7 



1 2 3 4 5 6 7 

t 



Figure 1: Error probability decay on regular trees for iterative Bayesian learning, with P [xi 7^ s] = 0.3 (cf. 
Appendix |b]). The data used to generate this figure is displayed in Table |3] 



Figure[T]plots decay of error probabilities in regular 
trees for iterative Bayesian learning with P [xi 7^ s] = 
0.3, where the agents break ties by picking their orig- 
inal private signals. Each of the curves (for different 
values of d) in the plot of log(— log P [tTi(t) 7^ s]) vs. 
t appear to be bounded below by straight lines with 
positive slope, suggesting doubly exponential decay 
of error probabilities with number of iterations. 
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A Majority dynamics: Proof of 
Theorem 15.41 

In this section we study a very simple update rule, 
'majority dynamics'. We use ai{t) to denote votes 
under the majority dynamics. 

Definition A.l. Under the majority dynamics, each 
node i V chooses his vote in round t -\- I according 
to the majority of the votes of his neighbors in round 
t, i.e. 

a,(t + l) =sign ^a,(t) 
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Ties are broken by flipping an unbiased coin. 

As before, s E {— 1,+1} is drawn from a 50-50 
prior and nodes receive 'private signals' (7^(0) that 
are correct with probabiUty 1 — S, and independent 
conditioned on s. 

Consider an undirected d regular tree. The analysis 
in this case is complicated (relative to the case of 
a directed tree) by dependencies which have to be 
carefully handled. 

Lemma A. 2. Let i and j be adjacent nodes in the 
tree. Then for all € {-1, 



with sgn(O) being assigned value —1 or +1 with equal 
probability. We have 

p[a,(i + i)--iie,s = +i] < 

P [Binomial(d - l,St) > d/2 ~ 1] 



from Eq. (20) and conditional independence of 



ai,{t),...,ai,_,{t). This yields Eq. (19). Eq. (18) 



t-l <;t-l 



follows by summing over ,5] 



. . , (7, 



□ 



P [a^it) - -l|a*-\a*-\s = +l] < St 
where 6t is defined recursively by So = S, and 
St = P [Binomial(d - 1, St-i) > d/2 - 1] 



(16) 



(17) 



Proof. We proceed by induction. Clearly Eq. ([16 



holds for t ~ 0. Suppose Eq. ( 16 ) holds for some t. 
We want to show 



for aU {aldj) G {-1, +l}2(*+i). 

Let li,l2, . ■ . , 1(1-1 be the other neighbors of node i 
(besides j). We will show that, in fact, 



a,(i + l) 



-1 cr,,cr,-,cr, , 



,al\s 



-1 



for aU possible ^ = (ct| , (?* , ^ , ct^*" ^ 

We reason as follows. Fix the state of the world 
s and the trajectories a* and a*. Now this induces 
correlations between the trajectories of the neighbors 
Zi, . . . , Id-i, caused by the requirement of consistency, 
but only up to time t — 1. If we further fix (?;*~^, then 
(7/^ (t) (and ai^ at all future times) is conditionally 
independent of {^l ,)„,,^^- Thus, we have 



< ^t+i , (19) 
,a*-i). 



-iU,.s = +i] 



-1] 



and therefore, using the induction hypothesis 
¥[diJt)^-l\^,s^+l]<St 



(20) 



Proof of Theorem 5.4 By applying the multiplica- 
tive version of the ChernofF bouncj^ to Eq. (17) we 
have that 

St+i < e('^-2)/2-(d-i)5,(25,(d _ l)/(d _ 2))('^-2)/2 

Dropping the term e^'^'^^^-'''* , we obtain 

^t+i < {2eSt{d - l)/id - 2))5('*-2)_ (21) 

This is a first order non-homogeneous linear recur- 
sion in log St ■ If it were an equality it would yield 



[d,it + i) = -i\dld'^,s = +i]<St+i, (18) log^t 



log (5 + 
a 

~ d-4 



d~2 



log[2e(d-l)/(d-2)] [i(d-2)] 



log[2e(d-l)/(d-2)] 



and so 



as long as 



\ogSt G ^2((i(d-2))' 



(22) 



log5< ^log[2e(d-l)/(d-2)] 



for all TO G {l,2,...,d— 1}. Also, the actions 
a;^ (t), . . . , (T/^ j^ (t) are conditionally independent of 
each other given ^, s = +1. We have 

a,(i + 1) = sgn(a, (t) +di,{t) + ... + (t)) , 



□ 



Theorem |5.4| is non-trivial for d > 5. The upper 
limit of the noise for which it establishes rapid con- 
vergence approaches (2e)~^ as d grows large (see also 
the discussion below for large d). 

A.l Convergence for large d 

We present now a short informal discussion on the 
limit d — >■ oo. We can, in fact, use Lemma [5.2| to show 
convergence is doubly exponential for S < 1/2 — c/d 
for some c < cxd that does not depend on d. 

Here is a sketch of the argument. Suppose 
S = 1/2 - ci/d. Then, for all d > di 
where di < oo, there exists C2 < oo such that 



more formal proof can be constructed by mirroring tlie 
reasoning used in the proof of Theorem 14.31 



, E[X] 



6P[X> {l-h»?)E[X]] < ((Y^fjIW) ■ We substitute 
E [X] = 5t{d - 1) and 1 + = {d/2 - l)/[<5t(d - 1)]. 
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P [Binomial(d - 1,(5) > rf/2 - 1] < l/2-C2/Vd. This 
can be seen, for instance, by coupling with the 
Binoniial((i — 1, 1/2) process and using an appropri- 
ate local central limit theorem (e.g., see [101 Theo- 
rem 4.4]). Thus, 6i < l/2-C2/Vd. Further, C2 can 
be made arbitrarily large by choosing large enough 
Ci. Next, with a simple application of the Azuma's 
inequality, we arrive at ^2 < C3 (where C3 — >■ as 
C2 — >■ 00). Now, for small enough C3, we use the 



Chernoff bound analysis in the proof of Theorem 5.4 
and obtain doubly exponential convergence. 



Round 


Bayesian 


Majority 


U 


U.io 


U.io 


1 


6.1 • 10-2 


6.1 ■ 10-2 


2 


1.5- 10-2 


3.0- 10-2 


3 


3.0- 10-^ 


1.6 ■ 10-2 


4 


3.4- lO-"* 


9.2 • 10-^ 


5 


2.7- 10-^ 


5.5- 10-^ 


6 


2.2 ■ 10-^ 


3.4 ■ 10-^ 


7 


1.4- 10-'^ 


3.4 ■ 10-^ 



Table 2: d = 3, P [xi / s] = 0.15 



B Further numerical results 

Table [2j together with table [T] above, contrast the 
error probabilities of Bayesian updates with those of 
majority updates. All cases exhibit lower error prob- 
abilities (in the weak sense) for the Bayesian update, 
consistent with Conjecture |5.3[ Table [3] contains the 
data plotted in Figure [l] Also for these parameters, 
we found that the Bayesian updates showed lower er- 
ror probabilities than the majority updates (though 
we omit to present the majority results here). 

The running time to generate these tables, on a 
standard desktop machine was less than a minute. 
We did not proceed with more rounds because of nu- 
merical instability issues which begin to appear as 
error probabilities decrease. 



Round 


d = 3 


d = 5 


d = 7 





0.30 


0.30 


0.30 


1 


0.22 


0.16 


0.13 


2 


0.13 


5.1 ■ 10-2 


1.3 • 10-2 


3 


7.8- 10-2 


4.1 ■ 10-^ 


4.4 • 10-^ 


4 


3.8- 10-2 


1.6- 10-^ 




5 


1.7- 10-2 






6 


5.7- 10-^ 






7 


1.5- 10-^ 







Table 3: Error probabilities with P [xi / s] = 0.3, for 
regular tree of different degrees d. This data is displayed 
in Figure [1] 
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